{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Vanilla Youtube Recommender example.ipynb",
      "version": "0.3.2",
      "provenance": [],
      "private_outputs": true,
      "collapsed_sections": [],
      "toc_visible": true,
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "[View in Colaboratory](https://colab.research.google.com/github/whongyi/openrec/blob/master/tutorials/Vanilla_Youtube_Recommender_example.ipynb)"
      ]
    },
    {
      "metadata": {
        "id": "uG4AqP398ndM",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "<p align=\"center\">\n",
        "  <img src =\"https://recsys.acm.org/wp-content/uploads/2017/07/recsys-18-small.png\" height=\"40\" /> <font size=\"4\">Recsys 2018 Tutorial</font>\n",
        "</p>\n",
        "<p align=\"center\">\n",
        "  <font size=\"4\"><b>Modularizing Deep Neural Network-Inspired Recommendation Algorithms</b></font>\n",
        "</p>\n",
        "<p align=\"center\">\n",
        "  <font size=\"4\">Hands on: Customizing Deep YouTube Video Recommendation. Vanilla YouTube Recommender</font>\n",
        "</p>"
      ]
    },
    {
      "metadata": {
        "id": "fx9f__-hL3C2",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "# Install OpenRec and download dataset"
      ]
    },
    {
      "metadata": {
        "id": "iCPtcmnDKsBH",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "!pip install openrec\n",
        "\n",
        "import urllib.request\n",
        "\n",
        "dataset_prefix = 'http://s3.amazonaws.com/cornell-tech-sdl-openrec'\n",
        "urllib.request.urlretrieve('%s/lastfm/lastfm_test.npy' % dataset_prefix, \n",
        "                   'lastfm_test.npy')\n",
        "urllib.request.urlretrieve('%s/lastfm/lastfm_train.npy' % dataset_prefix, \n",
        "                   'lastfm_train.npy')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "z1hg5H5D9-pO",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "## load lastfm dataset"
      ]
    },
    {
      "metadata": {
        "id": "qd6iP8xyOA5P",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "import numpy as np\n",
        "\n",
        "total_users = 992   \n",
        "total_items = 14598\n",
        "train_data = np.load('lastfm_train.npy')\n",
        "test_data = np.load('lastfm_test.npy')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "74OuLHTt7gfI",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "train_data[:10], test_data[:10]"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "fRv91Zg9rX-m",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "# Task\n",
        "\n",
        "In this task, we will user listening historys to preict next listens. You will need to implement a vanilla verision of the YouTube video recommender using openrec.tf1."
      ]
    },
    {
      "metadata": {
        "id": "xXV-a9jCQtvh",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "# Implement the Vanilla Youtube Recommender\n",
        "\n",
        "To implement  a model using OpenRec, you will need to first decide how this recommender should be decomposed into subgraphs, i.e., inputgraph, usergraph, itemgraph, interactiongraph and optimizergraph. For example, the training graph of `VanillaYouTubeRec` can be decomposed as follows.\n",
        "\n",
        "<p align=\"center\">\n",
        "  <img src =\"https://s3.amazonaws.com/cornell-tech-sdl-openrec/tutorials/vanilla_youtube_module.png\" height=\"600\" />\n",
        "</p>\n",
        "\n",
        "* **inputgraph**: item consumption history and the groundtruth label.\n",
        "* **usergraph**: left as empty as no user-specific latent factor is needed.\n",
        "* **itemgraph**: extract latent factors for items.\n",
        "* **interactiongraph**: uses MLP and softmax to model user-item interactions.\n",
        "\n",
        "A sample specification of Vanilla Youtube Recommender can be as follows.\n",
        "\n",
        "<p align=\"center\">\n",
        "  <img src =\"https://s3.amazonaws.com/cornell-tech-sdl-openrec/tutorials/vanilla_youtube.png\" height=\"300\" />\n",
        "</p>"
      ]
    },
    {
      "metadata": {
        "id": "uyHpdeO6LawM",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "**Your task **\n",
        "-  fill in the placeholders in the implementation of the `VanillaYouTubeRec` function \n",
        "-  successfully run the experimental code with the recommender you just built. "
      ]
    },
    {
      "metadata": {
        "id": "DybVedLuNe_d",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "from openrec.tf1.recommenders import Recommender\n",
        "from openrec.tf1.modules.extractions import LatentFactor\n",
        "from openrec.tf1.modules.interactions import MLPSoftmax\n",
        "import tensorflow as tf\n",
        "\n",
        "\n",
        "def Tutorial_VanillaYouTubeRec(batch_size, dim_item_embed, max_seq_len, total_items,\n",
        "        l2_reg_embed=None, l2_reg_mlp=None, dropout=None, init_model_dir=None,\n",
        "        save_model_dir='Vanilla_YouTube/', train=True, serve=False):\n",
        "    \n",
        "    rec = Recommender(init_model_dir=init_model_dir,\n",
        "                      save_model_dir=save_model_dir, train=train, serve=serve)\n",
        "\n",
        "    \n",
        "    @rec.traingraph.inputgraph(outs=['seq_item_id', 'seq_len', 'label'])\n",
        "    def train_input_graph(subgraph):\n",
        "      \n",
        "        subgraph['seq_item_id'] = tf.placeholder(tf.int32, \n",
        "                                      shape=[batch_size, max_seq_len],\n",
        "                                      name='seq_item_id')\n",
        "        subgraph['seq_len'] = tf.placeholder(tf.int32, \n",
        "                                      shape=[batch_size], \n",
        "                                      name='seq_len')\n",
        "        subgraph['label'] = tf.placeholder(tf.int32, \n",
        "                                      shape=[batch_size], \n",
        "                                      name='label')\n",
        "        \n",
        "        subgraph.register_global_input_mapping({'seq_item_id': subgraph['seq_item_id'],\n",
        "                                                'seq_len': subgraph['seq_len'],\n",
        "                                                'label': subgraph['label']})\n",
        "        \n",
        "        \n",
        "    @rec.servegraph.inputgraph(outs=['seq_item_id', 'seq_len'])\n",
        "    def serve_input_graph(subgraph):\n",
        "        subgraph['seq_item_id'] = tf.placeholder(tf.int32, \n",
        "                                      shape=[None, max_seq_len],\n",
        "                                      name='seq_item_id')\n",
        "        subgraph['seq_len'] = tf.placeholder(tf.int32, \n",
        "                                      shape=[None],\n",
        "                                      name='seq_len')\n",
        "        subgraph.register_global_input_mapping({'seq_item_id': subgraph['seq_item_id'],\n",
        "                                                'seq_len': subgraph['seq_len']})\n",
        "\n",
        "    \n",
        "    @rec.traingraph.itemgraph(ins=['seq_item_id'], outs=['seq_vec'])\n",
        "    @rec.servegraph.itemgraph(ins=['seq_item_id'], outs=['seq_vec'])\n",
        "    def item_graph(subgraph):\n",
        "        _, subgraph['seq_vec']= LatentFactor(l2_reg=l2_reg_embed,\n",
        "                                      init='normal',\n",
        "                                      id_=subgraph['seq_item_id'],\n",
        "                                      shape=[total_items,dim_item_embed],\n",
        "                                      subgraph=subgraph,\n",
        "                                      scope='item')\n",
        "        \n",
        "    \n",
        "    @rec.traingraph.interactiongraph(ins=['seq_vec', 'seq_len', 'label'])\n",
        "    def train_interaction_graph(subgraph):\n",
        "        MLPSoftmax(user=None,\n",
        "                   item=subgraph['seq_vec'],\n",
        "                   seq_len=subgraph['seq_len'],\n",
        "                   max_seq_len=max_seq_len,\n",
        "                   dims=[dim_item_embed, total_items],\n",
        "                   l2_reg=l2_reg_mlp,\n",
        "                   labels=subgraph['label'],\n",
        "                   dropout=dropout,\n",
        "                   train=True,\n",
        "                   subgraph=subgraph,\n",
        "                   scope='MLPSoftmax'\n",
        "                  )\n",
        "\n",
        "        \n",
        "    @rec.servegraph.interactiongraph(ins=['seq_vec', 'seq_len'])\n",
        "    def serve_interaction_graph(subgraph):\n",
        "        MLPSoftmax(user=None,\n",
        "                   item=subgraph['seq_vec'],\n",
        "                   seq_len=subgraph['seq_len'],\n",
        "                   max_seq_len=max_seq_len,\n",
        "                   dims=[dim_item_embed, total_items],\n",
        "                   l2_reg=l2_reg_mlp,\n",
        "                   train=False,\n",
        "                   subgraph=subgraph,\n",
        "                   scope='MLPSoftmax'\n",
        "                   )\n",
        "\n",
        "        \n",
        "    @rec.traingraph.optimizergraph\n",
        "    def optimizer_graph(subgraph):\n",
        "        losses = tf.add_n(subgraph.get_global_losses())\n",
        "        optimizer = tf.train.AdamOptimizer(learning_rate=0.001)\n",
        "        subgraph.register_global_operation(optimizer.minimize(losses))\n",
        "    \n",
        "    \n",
        "    @rec.traingraph.connector\n",
        "    @rec.servegraph.connector\n",
        "    def connect(graph):\n",
        "        graph.itemgraph['seq_item_id'] = graph.inputgraph['seq_item_id']\n",
        "        graph.interactiongraph['seq_len'] = graph.inputgraph['seq_len']\n",
        "        graph.interactiongraph['seq_vec'] = graph.itemgraph['seq_vec']\n",
        "\n",
        "        \n",
        "    @rec.traingraph.connector.extend\n",
        "    def train_connect(graph):\n",
        "        graph.interactiongraph['label'] = graph.inputgraph['label']\n",
        "\n",
        "\n",
        "    return rec"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "_hxy8mj0xJQ1",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "# Experiement\n",
        "We will use the recommender you implemented to run a toy experiement on the LastFM dataset. "
      ]
    },
    {
      "metadata": {
        "id": "jOwNJ4QF-MCx",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "## preprocessing dataset"
      ]
    },
    {
      "metadata": {
        "id": "olf0fFSTLUrg",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "from openrec.tf1.utils import Dataset\n",
        "\n",
        "train_dataset = Dataset(train_data, total_users, total_items, \n",
        "                        sortby='ts', name='Train')\n",
        "test_dataset = Dataset(test_data, total_users, total_items, \n",
        "                       sortby='ts', name='Test')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "ralJgDJb-Gn9",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "## hyperparameters and training parameters"
      ]
    },
    {
      "metadata": {
        "id": "M_17lWG_OEhm",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "dim_item_embed = 50     # dimension of item embedding\n",
        "max_seq_len = 100       # the maxium length of user's listen history\n",
        "total_iter = int(1e3)   # iterations for training \n",
        "batch_size = 100        # training batch size\n",
        "eval_iter = 100         # iteration of evaluation\n",
        "save_iter = eval_iter   # iteration of saving model   "
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "eJdIPZZf-Qx2",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "## define sampler\n",
        "We use `TemporalSampler`  and `TemporalEvaluationSampler` to sample sequences of training and testing samples. "
      ]
    },
    {
      "metadata": {
        "id": "HzlKCqgyPRyp",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "from openrec.tf1.utils.samplers import TemporalEvaluationSampler,TemporalSampler\n",
        "\n",
        "train_sampler = TemporalSampler(batch_size=batch_size, max_seq_len=max_seq_len, \n",
        "                                dataset=train_dataset, num_process=1)\n",
        "test_sampler = TemporalEvaluationSampler(dataset=test_dataset, \n",
        "                                         max_seq_len=max_seq_len)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "3ucqUtRd-YZN",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "## define evaluator"
      ]
    },
    {
      "metadata": {
        "id": "HzKk_8lW7Wwf",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "from openrec.tf1.utils.evaluators import AUC, Recall\n",
        "\n",
        "auc_evaluator = AUC()\n",
        "recall_evaluator = Recall(recall_at=[100, 200, 300, 400, 500])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "5w04YyoE-UEm",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "## define model trainer\n",
        "\n",
        "we used the Vanilla version of the Youtube recommender to train our model."
      ]
    },
    {
      "metadata": {
        "id": "kWX0XpnT7RBE",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "from openrec import ModelTrainer\n",
        "\n",
        "model = Tutorial_VanillaYouTubeRec(batch_size=batch_size,\n",
        "                   total_items=train_dataset.total_items(),\n",
        "                   max_seq_len=max_seq_len,\n",
        "                   dim_item_embed=dim_item_embed,\n",
        "                   save_model_dir='vanilla_youtube_recommender/',\n",
        "                   train=True, \n",
        "                   serve=True)\n",
        "\n",
        "model_trainer = ModelTrainer(model=model)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "FwKd_iFB-thk",
        "colab_type": "text"
      },
      "cell_type": "markdown",
      "source": [
        "## training and testing"
      ]
    },
    {
      "metadata": {
        "id": "q-UffgZW7Rp9",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        "model_trainer.train(total_iter=total_iter, \n",
        "                    eval_iter=eval_iter,\n",
        "                    save_iter=save_iter,\n",
        "                    train_sampler=train_sampler,\n",
        "                    eval_samplers=[test_sampler], \n",
        "                    evaluators=[auc_evaluator, recall_evaluator])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "metadata": {
        "id": "H8qMdWGI7p1V",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
        ""
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
}